Decorrelating Inputs
To access the Correlation Matrix:
-
Click the Field Correlation button to access the Correlation matrix from the Visualization view,
- or - -
click the Field Correlation button on the Model Input Selection dialog (displayed after clicking the Construct model button on the Modeling view).
Correlation Matrix Input selection
Upon clicking the Field Correlation button, a field selection dialog is displayed. Select the fields you wish to include on the Correlation Matrix by using the add (>) and remove (<) or add all (>>) and remove all (<<) buttons and click OK.
NOTE:Fields not selected will not be included in the Correlation Matrix view.
The Correlation Matrix View
The Correlation Matrix view displays every selected input field and it's correlation with all other fields in the data set using a matrix.
Color coding
The table uses color coding to indicate the percentage of correlation between the fields.
-
Highly correlated variables are shown in red (correlation above 90% or below -90%)
-
Semi correlated variables are shown in blue (correlation between 75% and 90% or -75% and -90%)
-
Rows and columns containing variables marked as correlated are shown with a grey background.
Setting the color range
The ranges specified for color coding the correlations can be changed by clicking on the Ranges button. The values that are displayed in the correlation matrix are rounded up to the nearest decimal point. In other words a value of 3.27 is rounded up to 3.3. Only positive values can be entered. The negative values in the matrix denote the gradient of the relationship between the two variables. Decimal numbers are allowed to be entered, eg. 75.44.
NOTE: The Correlation Matrix view displays a mirrored view of the entire correlation matrix. The view is mirrored diagonally across with fields shown as having a 100% correlation with themselves.
Continue by inspecting all highly correlated variables and marking them as correlated. The aim of the exercise should be to eliminate all red cells by marking highly correlated variables until no red cells remain on the view.
NOTE: The fields marked as highly correlated are not removed from the data set. They are simply flagged as correlated in order to assist the user in selecting the proper inputs during model construction.
To mark a variable as correlated
-
Select the row containing the variable you wish to mark
-
Click the Mark button to mark the variable as being correlated
To unmark a variable previously marked as correlated:
-
Select the row containing the variable you wish to unmark
-
Click the unmark button to reset the variable
To unmark or clear all previously marked variables:
-
Simply click the Clear button to unmark all variables.
Exporting the correlation matrix
To export the correlation matrix as a Comma Separated Value (CSV) file, click on the Export button.
Using field categories
The field correlation matrix allows for showing or hiding variables categorised using the Continuous Troubleshooter variable classification option. Simply select or de-select the desired options below the correlation matrix to show or hide the required field categories.
Related topics: